Thinking beyond cut-off scores in the assessment of potentially addictive behaviors: A brief illustration in the context of binge-watching

Abstract While applying a diagnostic approach (i.e., comparing “clinical” cases with “healthy” controls) is part of our methodological habits as researchers and clinicians, this approach has been particularly criticized in the behavioral addictions research field, in which a lot of studies are conducted on “emerging” conditions. Here we exemplify the pitfalls of using a cut-off-based approach in the context of binge-watching (i.e., watching multiple episodes of series back-to-back) by demonstrating that no reliable cut-off scores could be determined with a widely used assessment instrument measuring binge-watching.

We capitalized on an international data set comprising 12,616 BWESQ answers from series viewers (Flayelle, Castro-Calvo, et al., 2020). We applied the criteria from prior work on binge-watching (Billaux, Billieux, Gärtner, Maurage, & Flayelle, 2022;Flayelle, Verbruggen, et al., 2020) 1 to distinguish three groups: 1) non-binge-watchers (n 5 2,642), with a typical viewing session comprising less than three episodes and lasting for less than 2 h, with neither a reported functional impact caused by series watching nor self-identification as problematic series viewers; 2) trouble-free bingewatchers (n 5 2,345), with a typical viewing session comprising three or more episodes and lasting at least 2 h per viewing session without reporting a functional impact caused by series watching and without self-identifying as problematic series viewers; and 3) problematic bingewatchers (n 5 2,996), with a typical viewing session comprising three or more episodes and lasting at least 2 h, with a reported functional impact caused by series watching. This classification approach resulted in a final sample size of 7,983 participants (Age M(SD) 5 24.19 (7.91), 70.90% female). We thus excluded the remaining 4,633 participants who did not fulfill the criteria related to any of the three groups (e.g., participants who typically watched less than two episodes but for more than 2 h). However, because cut-off scores aim at dissociating clinical from non-clinical populations, we gathered non-binge-watchers and trouble-free binge-watchers into one group of non-problematic TV series viewers (n 5 4,987, Age M(SD) 5 24.74 (8.49), 67.70% female), in opposition to the group of problematic binge-watchers (n 5 2,996, Age M(SD) 5 23.28 (6.74), 76.30% female).
We conducted accuracy analyses for each of the seven BWESQ facets: binge-watching (e.g., "I always need to watch more episodes to feel satisfied"), dependency (e.g., "I am usually in a bad mood, sad, depressed or annoyed when I can't watch any TV series, and I feel better when I am able to watch them again"), desire/savoring (e.g., "I get really excited when a new episode is released"), engagement (e.g., "In my opinion, TV series are a part of my life and they contribute to my welfare"), loss of control (e.g., "I watch more TV series than I should"), pleasure preservation (e.g., "I worry about getting spoiled"), and positive emotions (e.g., "Watching TV series is a cause for joy and enthusiasm in my life").
Using SPSS 27.0 (IBM, Corp.), we first assessed the diagnostic accuracy with area under the curve (AUC) analyses of receiver operating characteristics (ROC) curves, following diagnostic accuracy guidelines (i.e., AUC <0.70 implying low accuracy, AUC ≥0.70 and <0.90 indicating moderate diagnostic accuracy, and AUC ≥0.90 corresponding to high diagnostic accuracy; Swets, 2014). Results indicated low or close to low accuracy for the following five facets: engagement (AUC 5 0.70), dependency (AUC 5 0.68), desire/savoring (AUC 5 0.72), positive emotions (AUC 5 0.66) and pleasure preservation (AUC 5 0.62). Because loss of control (AUC 5 0.82) and binge-watching (AUC 5 0.81) had moderate diagnostic accuracy, we conducted further accuracy analyses: specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV). As observed in Figs 1 and 2, and based on accuracy indices for each of the curve coordinates (see Appendixes A and B), a cut-off score of 15.50 (corresponding to an actual score of 16) optimizes the accuracy of both subscales, ensuring a minimization of false positives (contrarily to the values inferior to the 15.50 cut-off score). For the loss of control facet, this threshold yields a poor sensitivity score of 54.40% (yielding a rate of 45.60% false negatives), a more than acceptable specificity score of 89.30%, a medium PPV of 75.30%, and a medium NPV of 76.50%. Regarding the binge-watching facet, this threshold is related to poor sensitivity (56.10%, yielding 43.90% false negatives), a good specificity score (86.30%), and a medium PPV (71.20%) and NPV (76.60%). This implies that if clinicians were to use either the bingewatching or loss of control subscale for screening purposes, approximately 30% of respondents labeled as presenting problematic binge-watching would be misclassified (Maraz, Király, & Demetrovics, 2015). Considering such a substantial likelihood of generating false positives, we therefore cannot reasonably recommend the use of cut-off values for the binge-watching and loss of control facets of the BWESQ.
In summary, the current results indicate that no reliable BWESQ cut-off scores could be determined to accurately discriminate problematic from non-problematic binge-watchers. They also point to the notion that applying such a diagnostic approach might not be the most relevant in the context of binge-watching behaviors. Notably, since most putative behavioral addictions (except gambling and gaming disorders) are not yet recognized as such in international diagnostic classifications, the current lack of established diagnostic criteria for problematic and potentially addictive engagement in these activities prevents the generation of reliable cut-off scores. This is why researchers and clinicians should, at this stage, refrain from proposing cut-off scores in new scales that assess emerging problematic behaviors, including the binge-watching research field as well as other emerging conditions. Indeed, previous attempts to suggest cut-offs for such scales (e.g., in the context of "Internet addiction") resulted in unrealistic prevalence rates (up to 10%-20% of "pathological cases"; e.g., Kuss, Griffiths, Karila, & Billieux, 2014), thus promoting over-pathologization, stigmatization, and moral panic. Efforts should instead be focused on developing a strong research base to clarify where the dividing line between elevated but non-harmful 1 See Flayelle, Verbruggen, et al. (2020) for the rationale behind the selection of criteria used for creating the three groups. and problematic patterns of engagement resides. Clinically useful assessment criteria could then be derived, thus allowing for the generation of valid cut-off scores in terms of measurement instruments specially designed for this purpose.
It is worth noting that determining reliable cut-off scores for self-reported screening tools (such as the BWESQ) requires a gold standard (e.g., a diagnostic interview administered by a certified clinician), which was not possible in the present context as binge-watching is not a recognized  condition. We also want to point-out that the identification of problematic behaviors should go beyond the use of a single cut-off, and that different cut-offs could be used for different purposes. For example, we could opt for a different cut-off if our aim is to diminish the number of false positives to avoid over-pathologization effects, or if, in contrast, our objective is to reduce as far as possible false negatives to ensure that most persons in need of help are correctly identified via the screening instrument. Finally, future studies could also apply other statistical approaches (e.g., supervised machine learning) to identify optimal cut-off scores based on a selection of theoretically informed variables.